Goto

Collaborating Authors

 volumetric correspondence network


Volumetric Correspondence Networks for Optical Flow

Neural Information Processing Systems

Many classic tasks in vision -- such as the estimation of optical flow or stereo disparities -- can be cast as dense correspondence matching. Well-known techniques for doing so make use of a cost volume, typically a 4D tensor of match costs between all pixels in a 2D image and their potential matches in a 2D search window.


Reviews: Volumetric Correspondence Networks for Optical Flow

Neural Information Processing Systems

I vote for rejecting this submission from Neurips 2019. For once, it has severe presentation issues. Besides a fairly high number of typos and formatting issues, the explanations in the paper are at times lacking and counter intuitive, especially in the method part 3. The overview over the network architecture in Figure 3 needs to be explained more thoroughly such that it stands for itself. Regarding the method itself, the contribution over prior work is unfortunately not entirely clear to me. The main claim for novelty is an application of a truly 4D cost volume processing in contrast to state-of-the-art methods that "reshape the 4D cost volume as a multichannel 2D array with N U V channels" [9,...] (l. However, Flownet [9] which the authors refer to in this context does nothing of the sort, instead they extract patch-wise features for both input images and correlate pairs of patches to get a notion of correspondence.


Reviews: Volumetric Correspondence Networks for Optical Flow

Neural Information Processing Systems

The proposal is very efficient in term of speed and it will be very useful in computer vision. The first reviewer still was not convinced about the novelty and the clearness of the method. After a discussion the area chair suggest an acceptance.


Volumetric Correspondence Networks for Optical Flow

Neural Information Processing Systems

Many classic tasks in vision -- such as the estimation of optical flow or stereo disparities -- can be cast as dense correspondence matching. Well-known techniques for doing so make use of a cost volume, typically a 4D tensor of match costs between all pixels in a 2D image and their potential matches in a 2D search window. However, such layers require significant amounts of memory and compute, making them cumbersome to use in practice. As a result, SOTA networks also employ various heuristics designed to limit volumetric processing, leading to limited accuracy and overfitting. Instead, we introduce several simple modifications that dramatically simplify the use of volumetric layers - (1) volumetric encoder-decoder architectures that efficiently capture large receptive fields, (2) multi-channel cost volumes that capture multi-dimensional notions of pixel similarities, and finally, (3) separable volumetric filtering that significantly reduces computation and parameters while preserving accuracy. Our innovations dramatically improve accuracy over SOTA on standard benchmarks while being significantly easier to work with - training converges in 10X fewer iterations, and most importantly, our networks generalize across correspondence tasks.


Volumetric Correspondence Networks for Optical Flow

Yang, Gengshan, Ramanan, Deva

Neural Information Processing Systems

Many classic tasks in vision -- such as the estimation of optical flow or stereo disparities -- can be cast as dense correspondence matching. Well-known techniques for doing so make use of a cost volume, typically a 4D tensor of match costs between all pixels in a 2D image and their potential matches in a 2D search window. However, such layers require significant amounts of memory and compute, making them cumbersome to use in practice. As a result, SOTA networks also employ various heuristics designed to limit volumetric processing, leading to limited accuracy and overfitting. Instead, we introduce several simple modifications that dramatically simplify the use of volumetric layers - (1) volumetric encoder-decoder architectures that efficiently capture large receptive fields, (2) multi-channel cost volumes that capture multi-dimensional notions of pixel similarities, and finally, (3) separable volumetric filtering that significantly reduces computation and parameters while preserving accuracy.